Multi-class Classification in Big Data
نویسندگان
چکیده
The paper suggests the on-line multi-class classi er with a sublinear computational complexity relative to the number of training objects. The proposed approach is based on the combining of two-class probabilistic classi ers. Pairwise coupling is a popular multi-class classication method that combines all comparisons for each pair of classes. Unfortunately pairwise coupling su ers in many cases from incompatibility in that some regions of its input space the sum of probabilities are not equal to one. In this paper we propose the optimal approximation for probabilities in each point of object space. This paper proposes a new probabilistic interpretation of the Support Vector Machine for obtaining class probabilities. We show how the SVM can be viewed as a maximum likelihood estimate of a class of probabilistic models. As a computational method for big data we use the stochastic gradient descent approach minimizing directly the primal SVM objective. Unfortunately the hinge loss of the true SVM classi er did not allow to use SGD procedure for determining the classi er bias. In this paper we propose the piece-wise quadratic loss that helps to overcome this obstacle and gives an instrument to obtain the bias from SGD procedure.
منابع مشابه
Exploiting Associations between Class Labels in Multi-label Classification
Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...
متن کاملMULTI CLASS BRAIN TUMOR CLASSIFICATION OF MRI IMAGES USING HYBRID STRUCTURE DESCRIPTOR AND FUZZY LOGIC BASED RBF KERNEL SVM
Medical Image segmentation is to partition the image into a set of regions that are visually obvious and consistent with respect to some properties such as gray level, texture or color. Brain tumor classification is an imperative and difficult task in cancer radiotherapy. The objective of this research is to examine the use of pattern classification methods for distinguishing different types of...
متن کاملFeature-based Malicious URL and Attack Type Detection Using Multi-class Classification
Nowadays, malicious URLs are the common threat to the businesses, social networks, net-banking etc. Existing approaches have focused on binary detection i.e. either the URL is malicious or benign. Very few literature is found which focused on the detection of malicious URLs and their attack types. Hence, it becomes necessary to know the attack type and adopt an effective countermeasure. This pa...
متن کاملBig Models for Big Data using Multi objective averaged one dependence estimators
Even though, many researchers tried to explore the various possibilities on multi objective feature selection, still it is yet to be explored with best of its capabilities in data mining applications rather than going for developing new ones. In this paper, multi-objective evolutionary algorithm ENORA is used to select the features in a multi-class classification problem. The fusion of AnDE (av...
متن کاملSentiment Analysis of Social Networking Data Using Categorized Dictionary
Sentiment analysis is the process of analyzing a person’s perception or belief about a particular subject matter. However, finding correct opinion or interest from multi-facet sentiment data is a tedious task. In this paper, a method to improve the sentiment accuracy by utilizing the concept of categorized dictionary for sentiment classification and analysis is proposed. A categorized dictiona...
متن کاملA Knowledge Based Approach for Tackling Mislabeled Multi-class Big Social Data
The performance of classification models extremely relies on the quality of training data. However, label imperfection is an inherent fault of training data, which is impossible manually handled in big data environment. Various methods have been proposed to remove label noises in order to improve classification quality, with the side effect of cutting down data bulk. In this paper, we propose a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016